library(tidyverse)
library(DT)
library(mosaic)
library(plotly)
library(pander)
library(readr)

HSS <- read_csv("../../Data/HighSchoolSeniors.csv")

HSS_clean <- HSS %>%
  filter(!is.na(Importance_recycling_rubbish))

Introduction

Environmental awareness is a pressing concern for younger generations. This analysis examines high school seniors’ attitudes toward recycling and water conservation to determine if there is a statistically significant difference in their perceived importance. The question posed is:

“Do students who prioritize recycling over conserving water report higher importance ratings overall?”

By answering this question, we aim to uncover potential differences in environmental values among high school students. We will conduct a two-sample t-test comparing the importance ratings of recycling and water conservation. The null and alternative hypotheses are stated below.

Before analysis, rows with missing importance ratings (44 rows) were removed to ensure the integrity of the results. The cleaned dataset contained 193 observations for the Recycling More Important group and 262 observations for the Water Conservation More Important group, with no remaining missing values.

Hypotheses

Let \(\mu_1\) represent the mean importance rating for recycling and \(\mu_2\) represent the mean importance rating for water conservation.

  • Null Hypothesis (\(H_0\)): \(\mu_1 = \mu_2\) (There is no difference in the mean importance ratings.)
  • Alternative Hypothesis (\(H_a\)): \(\mu_1 \neq \mu_2\) (The mean importance ratings are different.)

We will conduct the test at a significance level of \(\alpha = 0.05\).

# Create a grouping variable for recycling vs. water conservation
HSS_clean<- HSS %>%
  mutate(
    recycling_vs_water = ifelse(Importance_recycling_rubbish > Importance_conserving_water,"Recycling More Important", "Water Conservation More Important")
  )
summary_stats <- favstats(Importance_recycling_rubbish ~ recycling_vs_water, data = HSS_clean)
pander(summary_stats, caption = "Summary Statistics for Recycling vs. Water Conservation")
Summary Statistics for Recycling vs. Water Conservation (continued below)
recycling_vs_water min Q1 median Q3 max mean sd
Recycling More Important 4 500 647 802 10000 725.4 981.9
Water Conservation More Important 0 304 583 800 1000 563.5 309.5
n missing
193 0
262 0

These statistics show that students who prioritize recycling have a higher mean importance rating (725.4) compared to students who prioritize water conservation (563.5). However, the large standard deviation (981.9) for the recycling group indicates high variability, potentially driven by extreme outliers, such as the maximum value of 10,000.

# Q-Q Plots for Recycling and Water Conservation
par(mfrow = c(1, 2)) # Side-by-side plots
qqnorm(HSS_clean$Importance_recycling_rubbish, main = "Q-Q Plot: Recycling")
qqline(HSS_clean$Importance_recycling_rubbish, col = "seagreen2")
qqnorm(HSS_clean$Importance_conserving_water, main = "Q-Q Plot: Water Conservation")
qqline(HSS_clean$Importance_conserving_water, col = "coral")

The Q-Q plots show that the importance ratings for recycling and water conservation are approximately normally distributed. This satisfies the assumptions of the t-test.

t_test_results <- t.test(Importance_recycling_rubbish ~ recycling_vs_water, data = HSS_clean)

pander(t_test_results, caption = "Two-Sample T-Test Results for Recycling vs. Water Conservation")
Two-Sample T-Test Results for Recycling vs. Water Conservation (continued below)
Test statistic df P value Alternative hypothesis
2.211 220.3 0.02805 * two.sided
mean in group Recycling More Important mean in group Water Conservation More Important
725.4 563.5

The t-test indicates a statistically significant difference between the two groups (𝑝=0.02805), supporting the alternative hypothesis. Students who prioritize recycling rate its importance significantly higher than those who prioritize water conservation.

# Create an interactive boxplot
plot_ly(
  data = HSS_clean,
  x = ~recycling_vs_water,
  y = ~Importance_recycling_rubbish,
  type = 'box',
  hoverinfo = 'text',
  text = ~paste(
    "Group: ", recycling_vs_water,
    "<br>Importance Rating: ", Importance_recycling_rubbish
  ),
  marker = list(outliercolor = "red", size = 6) # Highlight outliers
) %>%
  layout(
    title = "Interactive Importance Ratings by Environmental Priority",
    xaxis = list(title = "Group"),
    yaxis = list(title = "Importance Rating")
  )
## Warning: Ignoring 45 observations

Students who prioritize recycling assign it significantly higher importance ratings compared to students who prioritize water conservation. This suggests differing values in environmental priorities among high school seniors.

Conclusion

Students who prioritize recycling report significantly higher importance ratings (mean = 725.4) than those who prioritize water conservation (mean = 563.5), with the difference being statistically significant (p=0.02805). This suggests that high school seniors value recycling more strongly, though variability within the recycling group indicates differing levels of conviction. Further research could explore factors contributing to these differences.